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The cloud with strong storage management has recently developed in the big 
data world which can confirm the data integrity and keep just a single data 
duplicate. Many cloud auditing storage techniques have been developed to 
overcome the data deduplication (DD) problem, but they are vulnerable and 
can't resist brute force attacks (BFA). There is some privacy leakage 
problem that occurred in the present method. In this article, an original 
strategy called domain-user integra tag (DUIT) has been presented which 
comprises inter and intra deduplication with file tag and symmetric 
encryption key. The DUIT has two phases, the first one is random tag 
generation for Intra deduplication and the other is random ciphertext (CT) 
generation for encryption. The benefit of the DUIT is the security of 
individual user’s files would not reveal to people in general, hence we 
proved that the DUIT is protected from the BFA. Finally, an experiment has 
conducted in Linux processor and C program software. The outcome of 
DUIT demonstrates that our method has reduced the computation cost (CC) 
by 27% and 35% and searching complexity (SC) by 10% and 26% related 
with the previous methods. It is decided that the DUIT achieves the low CC 
and SC. 
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1. INTRODUCTION 


Recently, cloud computing has achieved great features in the cloud architecture field with the 
service of cloud storage (CS) systems which have been broadly acknowledged by people and enterprises. It is 
necessary to do data deduplication (DD), where a single duplicate of data is stored and duplicate copies are 
rejected. For that, users outsourcing encryption methods have developed and they might not want to expose 
their sensitive data to the cloud or different parties. To encode the data, concurrent encryption (CE) plan is 
projected to recognize the novel data [1]-[4]. The use of proxy is achieving server, or a standalone device, 
[5]. One of the protection techniques is Data splitting in which the complex data is allocated into segmented 
data and stored in various locations [6]. Fragmentation occurs and produces fragment information, for 
instance, a single fragment or re-identify the concept to whom it compares nor reveals the private data. In [7], 
[8]-[10], the data is fully stored in a cloud server hence, the user loses their right to control the data package 
and privacy leakages. Existing privacy techniques are completely founded on the encoded information, but 
this may be the deficiency of the unique privacy information and its loss the control of resisting the attacks. 
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In [11], three layer-based storage schemes were introduced with the guide of fog computing. In [12] 
introduced a group of vehicles that are topographically near to one another to make a virtual cloud (VC) 
safely, namelessly, and powerfully. The utilization of VC is to present a secure resource allocation with a 
secured client to convey a message. The cloud-based street condition monitoring scheme was introduced to 
screen the road conditions with the guide of a cloud server [13], [14]. Assuming a section involves the 
upsides of an attribute, then clearly fair knowing a rundown of analysis is futile to an interloper since he 
cannot relate them with the comparing subjects [15]-[18]. In [19] projected an investigation of different 
perspectives in distributed computing security dangers. This work gives a wide assortment of choices and 
moment organization of chose administrations. A convergent encryption calculation scrambles information 
with a main deterministically resulting from the data. To diminish the DD decentralization issue, [20] 
introduced, which provides decentralization of blockchain and no need to get the authority from a third 
authority, and encryption DD using a convergent key (CE). In [21] proposed an outfit dynamic enhancement- 
based converse versatile heuristic pundit. The proposed stratergy becomes seen from expert observation and 
gives a surmised arrangement when different work processes show up online at different window time (WT). 
The user keeps the similar file until the similar ciphertext (CT) storage and CE message encryption key (EK) 
management (MLE) to expand the privacy protection (PP). Distributed DD systems presented with high 
dependability accomplishing more security over information [22]—[24]. To confine the side channel data 
trickle, [25] designed a role symmetric encryption-proof of ownership (PoW) scheme. In [26] proposed a 
strong key-exposure strong auditing for protected CS which gives the thorough plan. In this plan, a reviewer 
is permitted to check the honesty of cloud information without downloading the whole information from the 
cloud. Data protection has been achieved for the local proxy which is a coherent substance that can be found 
on the client side. To confirm the honesty of data stored in the cloud, many CS auditing plans [27], [28] were 
proposed. One private key generator (PKG) was proposed to confirm the self and produce m for all clients, 
and one third party auditor (TPA) is utilized by clients to check the honesty of cloud data [29]. This 
methodology is unfortunate for huge scope clients since the PKG and the TPA probably will not have the 
choice to manage the heavy workload. Later, the knowledge of integrating linear error revising codes and 
linear homomorphic validation schemes combinedly projected to certify the safety [30]. This integration 
utilized just a single extra block to accomplish error tolerance and validation simultaneously. In this work, 
brute force attacks (BFA) and how to oppose the BFA and acknowledge DD has been researched with strong 
security insurance in CS auditing. 


2. RESEARCH METHOD 

The domain-user integra tag (DUIT) has three phases as data upload (DU), integra deduplication 
(ID), and data download which are displayed in Figure 1. The major aim of the DUIT is PP and DD between 
different domains. To figure it out DD with strong PP, a novel DUIT method was proposed to produce the 
œl, and employ a new strategy to produce the key for file encryption. In the first phase, the trusted key data 
structure (KDS) generates Ksym (m) for clients, a public key (n) for the cloud server provider (CSP), and the 
corresponding system public parameters (PS). The symbol and notation in the work have been listed in 
Table 1. 


2.1. Data upload 
For DU into the cloud, intra tag generation, inter and intra DD have been achieved in the upload 


procedure. For DD, each client from the distinct domain (D=1, 2, 3, ..., n) needs to generate a distinct Intra 
tag. The DU phase mainly includes four parts: Intra tag generation, Intra and Inter DD, and data 
encryption/key recovery. For each client U in Di, where i=1, 2, . . .. n, when U needs to upload the data 


length of file (FS), the user first generates an Intra-tag for DD. Then, the agent Ai performs the 
Intra-deduplication to prove the duplicate in the identical domain Di. If the duplicate does not happen, then 
the CSP needs to further conduct the inter-deduplication among different domains. Finally, if the duplicate is 
found, then U recovers the CE made by the first uploader. 

The initial client directs a file FS “upload|| (Length pç 
Length pç. Check whether a duplicated copy of the file (FS) exists by comparing æq, to the previously stored 
tag value from Di. When an initial client U; from D; needs to upload file FS, U, chooses a random number 
Tm E Wp and generates a random Intra-file-tag œ with the EKsym and Ksym (pk=Ksym, MACgym). 


aq,) to the agent. The Length „ç denotes the 


aa, = (d;"*g"™) (1) 
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Figure 1. The DUIT model 


Table 1. Symbol and notation 


S1. No Symbol Notation 
1 W1, W2 œl, file label 
2 m,n private-n pair 
3 EKsym, FD Symmetric Encryption key, File 
4 L, @, Kfpr, CF authenticator set, Intra-tag, FPK Key, Cipher text 
5 HH1, HH2, hh1, hh2 Hash functions 
6 WAR p, Di, Ksym Pseudo Random function, Security parameter, i-th domain, Private key 
7 Lengthps Length of File 
8 adp Ox i-th Domain Intra tag, Length of keyword 
9 POo, PO, ..., POn-1 Pointer 
10 c, R', 6,9 Cipher text small box, Blinded hash function, Measuring parameter, random value 


2.2. Intra and inter DD 

If the similar Intra-file-tag value has now been stored, then the agent returns “duplication||CF” to the 
initial client. Note that the CT is used to encapsulate some information of the EKsym. Otherwise, the agent 
uses the value to yield a random inter-tag ag, based on the EK. Then, the Agent stores the Intra-file-tag table 
and sends the message “upload (i, Lengthps, @q,)” to the cloud for the inter-deduplication, where i is the 
identifier of Di. After getting the message “upload (i,Lengthgs,a@q,)” from Di, the cloud does the 
Inter-deduplication to additionally take out the repetition of information. In our Integra search mechanism, 
the DD tree search approach has been proposed to search the DD files. Initially in algorithm 1, the input data 
has given with length and the resultant node. The searching activity for copied data is examined by the length 
of the word and node. Each node has searched the defendant on the length of the data. If the leaf node is a 
duplicate word, it will choose the node. Otherwise, pointers (PO, w1, PO,, w2, PO3, . . ., Dn—1, POn_1) are 
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employed to point the data in the root nodes. Note that in the intra-deduplication, Man-made intelligence 
judges whether the copy exists by contrasting R’ which is displayed in algorithm 2. However, in the 
Inter-deduplication, similar information will relate to various inter-tags. Accordingly, the CS cannot compare 
the R` of inter-tags to confirm the copy. 


Algorithm 1: deduplication tree search 
1 Input: (lengthrs,node) 
2 Search the input (based on data length) 
3 The searching node selection process by root node. 
If (node=root node) 
The leaf node search 
4 If (node=leaf node) 
Return node 
5 Else 
Case 1: lengthrs < w1 
Return search (lengthrs , P09) 
6 Case 2:0, < lengthps < Oky 
Return search (lengthgs , Pox) 
7 Case 3:0,_; < lengthps 
Return search (lengthgs , POn-1) 
End if 


Note that 1 < k <n—1, the keyword @, is the value of the data length Length,, Each non-leaf node 


contains n-1 keywords and n pointers: (PO), w1, PO,, w2, PO2, . . ., W_,_1, PO,_1). Pointers have used to 
point the stored data that contains the keyword Length „ç. 


Algorithm 2: Inter deduplication 


1 Initially, the message (i,lengthgs,@p,) from different 
domain users has given to the cloud server. 
2 Decision tree approach: After the reception of 


(i, lengthgs, p,)s Cloud server calls the function 
(lengthgs,node) to search whether the lengthps has been 
already stored or not. 
3 If (the value is not stored in the node) 
return “DU” 
else 
“No need to upload” 
4 If (same value found, check i=j) 
Return (“DU”) 
Else 
Verify 
p p 
e( anp gi) = e( angi) (2) 
5 If equation (2) holds then, 
Return (duplication||link) 
Else 
Return (“DU”) 
End if 
End if 
End if 


2.3. Algorithm set up (initial, subsequent user, and agent) 

The main use of Agent is to generate w, and (w2) for users presented in the individual domain (D1, 
D2, D3...). The œl is answerable for testing the duplicate file present in the cloud. Just as file label has been 
utilized to encode and validate key generation. Each domain has different and resulting clients, where the 
initial client sends the œ1 to the cloud through an agent as the file upload request. The agent has stored one 
duplicate of the œl table, when the files have been guided to the cloud through an agent, the agent 
acknowledges is there any past file that has a similar w1. In case, there is not stored in the CS, the initial 
client encrypts the file FS with the EK,,,,. At first, let accept a random number c E€ W,“ to compute the 
blinded hash function R’, the expression used to calculate the R is addressed in (3). 

The calculated R has been directed to AS, after getting the R’ from the initial user, the agent 
calculates the H’ = R"™ value with a K,, (m). Finally, H’ value will be led to the initial client. The initial 


client calculated 6 = H’n~° for Agent public key n, then generate w, and file label (w,). w, = HH2(6||2) 
represents the œ; for data duplicate checks. w = HH2(6||2) represents the file label for EKsym, Ksym, 


MACsym: 
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After the generation of w4, the œ; has been directed to the agent, where the agent stores the value. 
Assuming there is a current document that has something similar  , then it remembers the initial user not to 
send the file to the cloud. If there is nothing similar to the œ1 that was stored in the agent, then this œ; was 
directed to the cloud. The cloud again re-check the @; value. If the cloud keeps the œ; value, the next step 
will be processed initial user computes the EK_sym value EKsym = hh; (w2||FS), encrypts the file as 
Crs = Enc(FS, EKsym). The CT has been divided into small blocks (c4, C2, C3, «.. Cn). 


R' = HH,(FS)r° (3) 


2.4. Recovery of data (RD-algorithm) 

In this stage, the DD has been performed by request from the user who has the key to open the file. 
The client, who needs to utilize his file, presents a demand for the cloud foundry (CF) to the cloud. After 
getting the request from the client, the cloud initially confirms whether the client is the data owner of the 
ciphertext CF. If he is, the cloud sends the ciphertext CF, its relating authenticators , and the file tag æ to the 
client. Something, the cloud dismisses the client’s request. After getting the messages from the cloud, the 
client firstly confirms the validity of media access control (MAC) on file tag æ with his Ksym Kmac. If the 
MAC is valid, the client parses æ, then decodes the encoded portion by utilizing his reserved key and 
recovers the pseudo-random function (PRF) key kprp and the random value ø. The client checks whether the 
succeeding authentication equation holds or not. Mien ny Ni = Lrepan] Riis pO + P Nien ci- If the equation 
holds, the client trusts the ciphertext CF stored in the cloud is intact, then uses the EKsym to decode the 
ciphertext CF, and improves the file FS: FS=Dec (CF, EKsym). 


3. RESULTS AND DISCUSSION 

The experiment has been done in Intel i5 processor with RAM (8 GB) and Hard disk size (500 GB), 
2 TB. The Ubuntu (64 bit) operating system is utilized for implementing the proposed DUIT and the data set 
is determined from [3] and some virtual machine (VM) images in the cloudsim platform. We set the base 
field size to be 512 bits, the size of an element in W% to be |p|=160 bits, the size of an information file to be 
20 MB collected by 1,000,000 blocks. The strong security assurance couldn’t accomplish by [11]-[13] in 
which information privacy will have seeped to the key server. Likewise, [12]-[14] cannot accomplish 
validation DD which brings about substantial storage on the cloud side. In [12] method released the 
information wherein information stored in the cloud may be defiled or lost. Table 2 shows the comparison of 
various schemes alongwith parametes. When compared with novel scheme, the DUIT scheme has better 
results in all the parametes. 


Table 2. Data integrity, protection comparison 


Schemes Data integrity Strong PP Lightweight DD Authenticator 
auditing computation on the deduplication 
user side 

[11] Yes No No Yes Yes 

[12] Yes Yes Yes No No 

[13] Yes No No Yes No 

[14] No No No Yes No 

Proposed Model Yes Yes Yes Yes Yes 


3.1. Computational overhead (CO) 

CO was analyzed by client and cloud. Initial and subsequent client want to cost 2(Mulp, + 
2Expr, + 2hash,;,) and generate the wl, file label, and cost c (PRF; + Addw;) +(c+ 1)Mulw; to confirm 
the honour of cloud information. The initial client desires to cost n(PRF; + Mulw; + Addw;) to produce 
information authenticators. The subsequent client needs to cost EKsym + cM ulw; + Addy, +(c- 1) Addy 


to create the cloud and demonstrate that he precisely claims the record. Figures 2(a), 2(b), 2(c) and 2(d). In 
the plan [16], it directly uses the R` of the data as the foremost key to achieving the EKsym. As such, as long 
as the client has h(m), the EKsym selected by the initial uploader can be attained properly. Thus, this scheme 
would not introduce countless of additional CTs as in the plan [3]. However, this strategy is not resistant to 
BFA. 
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Figure 2. Performance outcomes of the DUIT with novel strategy (a) computation cost authenticator 
generation, (b) downloaded data vs computation cost, (c) the number of uploaded data vs computation cost, 
and (d) the number of domains vs computational cost 


3.2. Storage overhead (SO) 

To analyze the SO, we used [1], [3] benchmarks to care for the DD as a benchmark to CS. 
Figures 3(a) and 3(b) shows the SO of the DUIT, with various novel techniques [1], [3]. It clearly explains 
that the DUIT achieves, low storage cost (MB) related to the existing scheme. Subsequently, the DUIT is 
more proficient in cloud information storage. When relating with present procedures, for example, [1], [3] the 
DUIT attains 27% and 35% advance in CS cost. Just as, computation cost (CC) also linearly increased in the 
DUIT relating with present [1], [3] technique. 

The computation overhead (CO) for CT verification when various numbers of data blocks are tested. 
When 100 blocks are tested, the running time of CT verification takes 0.091031ms. The running time 
increases to 0.182273ms when 600 blocks are tested Computation and SO is shown in Table 3. 


3.3. Searching complexity vs stored data 

Searching complexity (SC) vs the number of stored data graphs is shown in Figures 4(a) and 4(b). 
When the number of stored data increased, the SC will also increase linearly. But relating to the novel [3] 
procedure, the SC is very low. Because the proposed strategy uses a decision tree approach for searching the 
duplicate data. In Intra deduplication, the SO has constant when the numbers of data blocks increased; 
whereas, in inter deduplication; the SC of the DUIT technique has improved by 15% [30]. The complexity 
has increased linearly when the number of stored data increased in [3]. The DUIT reduces the SC and SO (1) 
by constructing the hash table to search the duplicate files. 
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Figure 4. Performance outcomes of the DUIT with novel strategy (a) intra deduplication 
and (b) inter deduplication 


4. CONCLUSION 


In this article, novel DUIT deduplication has been proposed to determine the user privacy leakage 


and duplication issue when attacks are floated. The DUIT technique has an alternate domain with the initial 
client and successive client for a trivial CS auditing scheme. Moreover, decision tree-based deduplication 
with strong PP was likewise carried out to accomplish data integrity. The result displays that our DUIT 
technique provides 27% and 35% improvement in CS cost when associated with outsourcing and a 
lightweight computation scheme. The SC of the DUIT strategy increases 10% and 26% when associated with 
outsourcing and a lightweight computation scheme. Hence, the novel DUIT method achieves the 
computational cost, lower searching complexity in the deduplication verification and auditing phase. 
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